The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
对比度学习(CL)在任何监督的多级分类或无监督的学习中显示出令人印象深刻的图像表示学习进步。但是,这些CL方法无法直接适应多标签图像分类,因为难以定义正面和负面实例以对比多标签场景中给定的锚图像对比给定的锚图像,让标签单独丢失,这意味着借用了借用的标签通常,从对比度多级学习来定义它们的常用方式将产生许多不利的虚假负面实例。在本文中,通过引入标签校正机制来识别缺失的标签,我们首先优雅地产生了锚映像的单个语义标签的阳性和负面因素,然后定义了带有缺少标签的多标签图像分类的独特对比度损失(CLML) ),损失能够准确地使图像接近其真实的正面图像和虚假的负面图像,远离其真实的负面图像。与现有的多标签CL损失不同,CLML还保留了潜在表示空间中的低排名全球和局部标签依赖关系,在这些空间中,已证明此类依赖性有助于处理缺失的标签。据我们所知,这是在缺失标签方案中的第一个一般多标签CL损失,因此可以通过单个超参数与任何现有多标签学习方法的损失无缝配对。已提出的策略已被证明可以在三个标准数据集(MSCOCO,VOC和NUS范围内)提高RESNET101模型的分类性能,分别为1.2%,1.6%和1.3%。代码可在https://github.com/chuangua/contrastivelossmlml上找到。
translated by 谷歌翻译
积极的未标记(PU)学习旨在仅从积极和未标记的数据中学习二进制分类器,这在许多现实世界中都被使用。但是,现有的PU学习算法无法在开放且不断变化的情况下应对现实世界中的挑战,在这种情况下,未观察到的增强类的示例可能会在测试阶段出现。在本文中,我们通过利用来自增强类分布的未标记数据来提出一个通过增强类(PUAC)进行PU学习的无偏风险估计器,在许多现实世界中,可以轻松收集这些数据。此外,我们得出了针对拟议估计器的估计误差,该估计量为其融合到最佳解决方案提供了理论保证。多个现实数据集的实验证明了拟议方法的有效性。
translated by 谷歌翻译
计算机断层扫描(CTA)图像上的三维(3D)肾脏解析具有极大的临床意义。肾脏,肾肿瘤,肾静脉和肾动脉的自动分割在基于手术的肾癌治疗方面受益匪浅。在本文中,我们提出了一个新的NNHRA-UNET网络,并使用一个基于它的多阶段框架来细分肾脏的多结构并参加KIPA2022挑战。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译